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OPTIMIZING DEFRAGMENTATION OPERATIONS IN A 
DIFFERENTIAL SNAPSHOTTER 



CROSS-REFERENCE TO RELATED APPLICATIONS 
TOs invention claims priority based on U.S. Provisional Patent Application Serial 
No. 60/419,252, filed on October 16, 2002, which is hereby incorporated in its entirety by 
reference. 

FIELD OF THE INVENTION 
The present invention relates generally to data storage, and more particularly to 
snapshots of file system volumes. 

BACKGROUND OF THE INVENTION 
Data storage is an essential feature of computer systems. Such storage typically 
includes persistent data stored on block-addressable magnetic disks and other seconds 

storage media. Persistent data storage exists at several levels of abstraction, ranging from 
higher levels that are closer to the logical view of data seen by users nmning application 
programs, to lower levels that are closer to the underlying hardware that physically 
implements the storage. Atahigher,logical level, data is most commonly stored as files 

residing in volumes or partitions, which are associated with one or more hard disks. Tlie 
file system, which can be regarded asacomponentofthe operating system executing on 

the computer, provides the interface between application programs and nonvolatile 
storage media, mappingthelogicallymeaningfiilcoUectionofdatablocksinafile to 

their corresponding physical allocation units, or extents, located on a storage medium, 
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such as clusters or sectors on a magnetic disk. 

Users and administrators of computer systems benefit firom having the ability to 
recover earlier versions of files stored on the system Users may accidentally delete or 
erroneously modify files. . An administrator of a system that has become corrupted may 
5 wish to recover the entire state of a file system at some known good time before the 
corruption occurred. The underlying disk hardware can fail. A snapshot is one 
technique for facilitating the recovery of earlier versions of files. 

A snapshot of a volume is a virtual volume representing a point in time on the 
original volume. Some snapshotters capture the point-in-time data by mirroring the 
10 entirecontentsofthevolumeinitssnapshotstate. By contrast, differential snapshotters 

. .nni.« at the time of the snapshot. Rather, changes to the original 

voWaiecareMymonitored so that the virtual volume (i.e., the snapshot)can always 

be produced. A differential snapshotter will copy a block in the volume only if it is 
modified after the snapshot is taken; such a copy operation is called a "copy-on-write." 
1 5 TTie snapshot state of the volume can be reconstructed by using these copies of changed 
blocks along with the unchanged blocks in the original volume. In the usual case, 

files in the volume will be left unchanged following the snapshot, so differential 
snapshotters provide a more economical design than nondifferential approaches. As 
many changes occur to the original volume, however,adifferenti^ snapshotter must keep 

20 a large area of disk space to hold the older versions of the disk blocks being changed. 

In most operating systems, the extents that make up the physical allocation units 
implementingaparticularfUe may be discontiguous, as may the pool of allocation unit^ 

available as logically free space for use in fixture file space allocation. A disk volume in 
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such a ^.e is said to be exlemally ftagmented. In many such opmting systems, a 
volume can be expected to suffer ftom increasing external fragmentaton over time as 

files are added, deleted and modified. External fragmentaUon increases die time 
ne^ssaryto read and write dauin files, l^causeUreread/wdteheadsoftehard disk 

5 drivewiUhavetoincreasetheirlateralmovementtolocateinformationUrathasbecome 

spread over manynon^nU^nssectors. If frngmentationissuffrciendy severe, it can 

leadto significantly degradedperformancear^iresponsetimeintheoperationofthe 

computer system. 

Defragmentationutilityprograms provide an impor^^ 
10 systemsthatarepronetoextemalfragmentation. These utilities can be periodically run 

. i..ot;.n nf « volume's file extents so that contiguity of 
to rearrange mc pujio.vcx . • 

........ ....u 5no.re.ased and disk read/write access time is correspondingly reduced, 

aiiu^auvii i^xv/^.'iw 

improving performance. Adefraprentation operation consists of movmg some blocks in 

a file to a location that is f«e on dre volume. Mo« p«cisely, the contents of one block 
,5 arecopiedtoflrefeeblocklocadon. T^e old location otthe block becomes tee and the 
newlocadonofd«blockbeeomes«»piedspace.Tl.dcftaenen.ati„nofavolume 

will typicaUy involve an extensive number of such block moves. 

Although usersot file syst^nsbenefitftom the disk speed optimizations achieved 

by deftagmentation, the benefit has come at Ure expense of efficient use of differential 
20 snapshotters. ,f a volume is deftagmented subsequent to d« taking of a snapshot, the 
snapshotterwUlcnsurethateachdatablockrelocationbyftedeftagmenter is preceded 

by a copy-on-write of .he block. Tie logical view of fte original volume is unchanged 
by dredefragmentation operations, but because .he disk blocks on which ftc disk is 
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physically manifested change drastically in content, the amount of space needol to 
maintain the snapshot explodes. TOs disk space explosion may be enough to destroy a 
principal reason for nsing differential snapshotters in the Erst place, that of disk space 
economy. 

5 The problem seen in the interaction between differential snapshotters and 

deftagmentation operations is that, prior to the present invention, differential snapshotters 
have not been able to distinguish logically significant writes of blocks from logically 
insignificant block moves, treating both as requiring copy-on-write protection. This 
problem is particularly acute when there is a volume deftagmentation operation on the 
,0 original volume, but those of skill in d« art will appreciate that other file-manipulating 

..f„„„„t«„ mav reuuire the nonlogical relocation or shuffling of file 

For .v.mnle. a oroeram might, for performance reasons, create a file of a 
particular size and arrange the blocks in a desired way before proceedmg with fiuther use 
of the flic for writing data. Prior to the present invention, differential snapshotters have 
15 treated such block rearrangements as requiring copyon-write protection 

It can be seen, then, that tiicre is a need for an improvement in differential 
potters so that logically insignificant moves ofblocksftom one volume location to 

another are recognized as not requiring copy-on-write protection in principle. The 
availability of more efficient differential snapshotters will make more likely the use of 
20 snapshotsappliedonalonger-tcrmbasisfordatarecovery. Moreover, such an 

hnprovement will lead to greater use of deftagmentation utilities and therefore will allow 
disk speed optimizations to take place while having snapshots with littie performance 
impact and little disk space consumed. 
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SUMMARY OF THE INVENTION 
The present invention provides a method for capturing and maintaining a 
differential snapshotofanoriginal volume in whichlogicallysignificantmo^^^^^^^^^ 
5 blocks,v.h.chrequirecopy-on-writeprotection,aredistinguishedfromlogically 
insignificant blockmoves,whichinprinciple do not needto be preceded by copy-on- 
writeoperat^ons. The .nventioninvolvestheuseofamesystemwith^e ability topassa 

BLOCK.COPY command down to lower-level, block-oriented drivers, a capacity not 
available in previousmesystems,whichenables such drivers to take advantageof 

10 hardwareaccelerationfordatablockmovements. In particular, a snapshot driver, 

.u.. o ,.n„^«t<>H nneration is a nonlogical block move, uses 

mtormea oy uic mc ajfaiwu. - - --i — 

. ;„ Vr^Moe.. to avoid umiecessary copy-on-write operations, histead, 

*e»»psho«erstaplyupda.este»a„slaUo„ubMa«s«ucm«si.™pl«ys»k«p 

«K.k of which blocks must be pro.«ted by copy.>n-»rite opcradons and where to 
15 snapshot versions otblocks are being stored. 

Those skilled in the art will readily perceive that the present invention is also 
applicable to differential s,«^shots of files and volnmes contained on block devices other 
te,„agneticdiskm«diaandto,heuse«fdifferentialsnapshoners.orec«ns.ruc.ti.ne. 

defined versionsof other persisten.da.astrucn.es. Other aspects and advantages of the 
20 inventionwillbecoraeapparentfi.mthef«llowingdetaileddescription.«kenin 
c„„j»cfion wift the accompanying drawings, iUustrating by way of example the 
principles of the invention. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a flow diagram illustrating the steps taken in an embodiment of the 
invention withrespecttoablockmovefromablocklocationAtoablocklocatio^ 

FIG. 2 is a flow diagram illustrating the steps taken under two scenarios in an 
5 embodimentoftheinventionfollowingtheblockmovefromAtoBofFiaiinthecase 

where, before the move, the snapshotter bitmap bitforblockBis set and the bitmap bit 

for blockAis clear, with FIG.2A illustrating the scenario whereAis written, and with 

FIG. 2B illustrating the scenario where there is a write to B; 

FIG. 3 is a flow diagram illustrating the steps taken in an embodiment of the 
10 inventionfollowingtheblockmovefromAtoBofFIG. 1 in the case where, before the 

. ... n c.t .nH the hitman bit for block A is clear, and where, 
move, ine oiuimp un lui ui^r^v ^ - 

. „ nfR W not vet occurred and a move ofblockB to a block 

aiici iiic iiiuvv, M "**v^ 

location C is initiated; 

FIG. 4 illustrates one possible computer in the context of which an embodiment of 

1 5 the present invention may be practiced; 

no. 5 iltetrates an exemplary multi-level secondary storage system associated 
«iftacomputer,s„d> as the computerofFICiin the context of«Uoh an embodiment 

of the present mvention may be practiced; 

FIG. 6 is a diagram presenting a detailed example of the handling of a logically 
20 significantblockwriteinanembodimentoftheinvention,withFIG.6Aprov^ 

view before the write and FIG. 6B providing the view after the write; 

FIG. 7 is a diagram presenting a detailed example of the handling of a simple 
logically msignificant block move in an embodiment of the invention, with FIG. 7A 
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providing the view before the block move and FIG. 7B providing the view after the 



move; 



FIG. 8 is a diagram continuing the detailed example of FIG. 7, presenting the 
handling of two logically significant block write requests in an embodiment of the 
5 invention,includingawriteatablocklocationfromwhichadatablockwasnonlogically 

moved, and a write at the block location to which that block was moved, with FIG. 8A 
providing the view before the writes and FIG. 8B providing the view after the writes; 

FIG. 9 is a diagram continuing the detailed example of FIG. 7, presenting the 
handling ofasecondlogically insignificant block move following the first move depicted 

10 in FIG. 7, with FIG. 9A providing the view before the block move and FIG. 9B providing 



LllC View ttitvx vixw/ XI.* . 



F!G. 10 is a diagram continuing the detailed example of FIG. 9, presenting the 
handling of a third logically insignificant block move following the second move 
depicted in FIG. 9, where the move is to the original block location as presented in FIG. 
15 7,withFIG. lOA providing the view before the block move and FIG. 1 OB providing the 

view after the move; 

FIG. 1 1 is a flow diagram presenting a high-level view of the steps taken in an 
embodiment of the mvention with respect to capturing and mamtaining the snapshot; 
FIG. 12 is a flow diagram presenting the steps taken in an embodiment of the 
20 invention with respect to the handling of a logically significant write request; 

FIG. 13 is a flow diagram presenting the steps taken in an embodiment of the 
invention with respect to the handling of a logically insignificant request to move a block; 



and 
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FIG. 14 is a flow diagram presenting the steps taken in an embodiment of the 
invention withrespectto the h^dlingofarequest to readablockinthe virtual 

corresponding to the snapshot of the original volume. 
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DETAILED DESCRIPTION OF THE INVENTION 
A differential snapshotter does have to perform any copy-on-™te operations 
ondisk space 4at™logicalIy»nnseda..he.imeofthesnapsho..Tlusistrue because 

tediskblocks that are fteeontta. snapshot will never need to be read »hen the 
snapshotter producesalogieal volume file or directory. Fo,thisreason,adifferential 
snapshotter may haveabitmapoftheblooksonthe volume. I. may setthebittoonebi. 
v^„e,suchasl.forblocb.ha. are fieeatthetimethatthe snapshot wastaken. and it 
^ysetto the same valuethebitscorrespondingtoblocksthathave already hadacopy- 
o„-whte sincethetimeofthe snapshot. Clearly,onlybi.stha.havetheotherbitva)ue(0 

if first bit value is 1) need to have their blocks copied-on-write. (In the 

. . H.h,. the first bit value, which may be called an 

accompanying uittwui^^ xt — 

... ...„ 1 ..A th«t the second bit value, which may be called a "protect" value, 

" IgnuiC vaiuw, A ****** 

isO.However,.heinvenUon is of course equally applicabletoembodimentswhichuseO 

as the "ignore" value and 1 as the "protect" value.) 

A deftagmentation operation consists of moving some blocks in a file to a 
l^ationthatisfteeonflte volume. The old location of theblockbecomes free and the 

newtafio»of.heblockb«comesoccup>ed. Tberefore. it suffrces fo, a diffe^ttial 
snapshotter in accordance with fteinvenfion.0 be informedth^ablockismoving from 

AtoBso that it can change its view ofwbat is free space and what is occupied space 

, withoutperforminganycopy-on-wdteoperaUonsbutinsteadsimplyupdatinga 

translation table. 

FIGS. 1-3 illustrate detailsof an embodimentofthe invention inhandlingablock 
movefromblockAtoblockB. Turning to FIG. Mhe procedure begins at step 11. 
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differential snapshotter is infonned that a block U moving from A to B by way of a 
BLOCK_COPY command passed dovm by the file system (step 13), rather than a 
READ.BLOCK followed by a WR1TE_BL0CK. This tells the difterential snapshotter 
^ operation is taking place. The differential volume snapshotter keeps a bitmap of 
5 one bit tor every block, where the bit being set mdicates that the snapshotter does not 
^to take any action whenit is written. Aclearbitindtcatesthatthe snapshotter hasto 

rakethecopy-on-write. The snapshot^rkeepsatranslationtable of(Block#-. Device, 

Block #) to support reading the snapshot. 

If the B bit is clear (step 15), then the snapshotter will copy-on-write the B block 
(stepl7) before it is ^^dtten by the move operation(stepl9)so that there is an entry in 
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. . 11 \„«^thpRWf is set in the bitmap (step 23). 
the table lor ine d oiu».js. v^jtw^ ^x, ^> 



jr^t.^ 1} ic cat in tVip. hitman. 

li lUC U L'AV t-Jr^* *** — — 



, there may or may not be an entry in the table for 
the B block. If B is free space at the time of the snapshot then there is no entry in the 
table. If the A bit is set (step 27), then the move operation writes B (step 29) and the 
15 snapshotterisdone(step25). Thereisnopomtindoinganythingif changesto Acanbe 

ignored. 

At this pomt we have reduced this problem to the case where the bit for block B is 
set and the bit for block A is clear. Now we let the move happen (step 29) and then 
change the bits to the A bit being set (step 3 1) and the B bit being clear (step 33). We 
20 add two entries to the translation table: (A SameDevice. B) (step 35) and (B ->» A) 
(st^ 37) where the -»> symbol is tml to denote that B originally comes from A. The 
second type of entry provides tor fast lookup and, in an embodiment of the invention it 
™y be used within the same table data sttuctt^e as the first type of entry with no extra 
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overhead. Those of skill m the art will recognize that the two kinds of table entry may 
equivalently be kept in two tables, and that reverse lookup may equivalently be 
performed in a translation table using only the first type of table entry. 

FIG. 2 continues the illustration of FIG. 1 where, originally, the bit for block B 
5 was set and the bit for block A was clear, presenting the steps taken by the snapshotter 
with respect to a subsequent write of block A in FIG. 2A and a subsequent write of block 
B in FIG. 2B. In FIG. 2A, following the completion of the steps illustrated in FIG. 1 (step 
41), henceforth A can be written freely (steps 43, 45), as its bit is set. In FIG. 2B, 
foUowingthecompletionofthe steps illustrated in FIG. 1 (step 51), a command to write 
10 toBinstep53willresultinacopy-on-writeofB(step55)followedbythewrite(step 

_ .r n aHripH to the table in olace of the previous entry 
yi). ine copy-oii-wmc ui u wm ".w.^ — 

, . o.„.T..,.„. m v;.iHina (A DiffAreaVolume, DiffAreaVolumeOffset) (step 

y/\ — f OttlllCJ-ZWivv, x-»^, c? 

59), the deletion of the (B -»> A) entry (step 61), and the setting of the B bit (step 63). 
DiffAreaVolume and DiffAreaVolumeOffset represent the differential storage space 
15 volume device and block number, respectively, to which block B is copied. 

FIG. 3 continues the illustration of FIG. 1 where, originally, the bit for block B 
was set and the bit for block A was clear, the steps associated with the move from A to B 
have occurred (through step 37 of FIG. 1), and a subsequent write of B has not yet 
occurred (step 69). In step 71, a move ofblockB to block C is initiated. Therules 
20 presented in FIG. 1 then apply, with block B now the old location (corresponding to 
block A m FIG. 1) and block C the new location (corresponding to block B in FIG. 1). 
The B bit is clear (from step 33 in FIG. 1). If the C bit is clear (step 73), then the 
snapshotter will copy-on-mite the C block (step 75) before it is written by the move 
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„pe,a.ion(s.ep77)softa.*e«isa»en»yin4e.ablefo,4eCblock(s«p79)a„d*eC 

bit is set in the bitmap (step 81). 

If, prior to move, the C bit is set, we let the move happen (step 85) and then 

ehange the bits to the B bit being set (step 87) and the C bit being clear (step 89). 
5 However,inpreparingtoimert(B^SameDevice,C)t„metranslationtable,wefmd.he 

(B -»> A) table entry in plaee. At this point, the snapshotter effects a composition, 
yielding the entries (A ^ SameDevice, C) (step 91) and (C ■>» A) (step 93), which 
would replace (A ^ B) and (B -»> A) (steps 95, 97). 

FIGS. 4-14 iUustrate aspects of embodiments of the invention in Srrther detail. 
,0 HO. 4 illustrates one exemplary computing environment 100 within which the pres«,. 

. Tu» .„„;,^««,pnt inn includes a general-purpose stored- 
invention may DC perioiiiicu. luww.. — - 

1 1 n which mav be comiected to one or more other computer- 

program tuuipuiw ixiwvixa^x^ — , 

based resources. suchasaremotecomputerlSOcomrcctedtothecomputerdevicellOby 
alocal areanetwork 171 or wide area nework 173. Tbe computer machine 110 includes 
15 at least one central processing unit 120 com.ec.ed by a system bus 121 to a primary 
nremor, 130. One or more levels of a cache ,22, comrected «. or sinaated within the 
processingunit 120, actasabufterfortl^ primary memory 130. Programs, comprising 

, of instmctions for the macWne 1 10. are stored in the memory 130, ftom which they 
, be retrieved and executoJ by the processing unit 120. In the course of executing 
20 program instructions, the processing unit 120 retrieves data 137 s«,red in the memory 130 
whennecessary. Among the programs and propam modules stored in the memory 130 

are those that comprise an operating system 134. 

The exemplary computer machme 1 10 ftrrther mcludes various input/output 



setsi 
can' 
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devices and media for writing to and reading from the memory 130, including secondary 
storage devices such as a non-removable magnetic hard disk 141, a removable magnetic 
disk 152, and a removable optical disk 156. Such computer-readable media provide 
nonvolatile storage of computer-executable instructions and data; the hard disk 141 is 
5 alsocommonlyusedalongwiththeprimarymemorynOmprovidingvirtualmemory. It 

will be appreciated by those skilled in the art that other types of computer-readable media 
that can provide volatile and nonvolatile storage of data accessible by a computer may 
also be used in the exemplary computer environment 100. The computer 1 10 has a file 
system 142 associated with the operating system 134. The file system 142 serves as an 
1 0 interface that maps a set of logically-organized named files to data physically stored on 

the hard disk 141. 

secondary meuia, sutu aa uaia oiv^ivx* ... — - - 

^u. X. f CTO «; iiln<5trates an exemolary multi-level secondary storage 

lllC uiagicuiA v/A X ^ 

system associated with a computer such as the computer depicted in FIG. 4, in the 
context of which an embodiment of the invention may be practiced. A differential 
1 5 snapshotter 21 1 may be regarded as a driver that mediates between the file sy^em 207 
and a block driver 215. The block driver 215 provides sector-level access to data 
contained in volumes 221, 225 corresponding to hard disks 219, 223. The snapshotter 
211 accesses data at the sector level through the block driver215. Executing programs 
201, 205, such as a disk defragmentation utility 203, access stored data at a higher, 
20 logical level through tiie file system interface 207. 

The differential snapshotter 21 1 is directed to take a snapshot 217 of an original 
disk volume 221 at a specified point in time. The snapshot is a virtual volume 217 
containing the versions of files in the volume 221 as they existed at tiie time of the 
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snapshot. Initially, no copying of data in the original volume 221 is done by the 
differential snapshotter211. After the timeofthe snapshot, the snapshotter211monitors 

and intercepts efforts by the file system 207 to access datablocks in the original volume 
221 on behalf of executing programs 201, 203, 205. If the file system 207 attempts to 
5 ^^^tenewdatatoablock,thesnapshotter211firstconsultsabitmap209todetermine 

whether it must preserve the data in that block with a copy-on-write operation before the 
write attempt can proceed. Ifacopy-on-write is necessary,thesnapshotter211writes the 

copy to a special differential storage area 227, possibly stored in another volume 225 on 
another disk 223, recording information identifying the copied block and the location in 

10 which it was copied in one or more table data structures 213. 

... P ., . : fiu <:v<:tpm 9.07 has the capacity to pass a 

In emDoaimenui ui uic mv (^imwi., - — - ■ - 

. r,r.r.^r A ir. inwpr.lpvel drivcrs. enabling lower-level drivers to take 

advantage of hardw^e acceleration for data block copies. In particular, the file system 
can pass the BLOCK_COPY command down to the snapshot driver 21 1 to request a 
15 logically insignificant relocation of a block from one block location to another in the 
volume 221 . Having received the BLOCK.COPY request, which signifies that the 
requested data movement is not logically significant, the snapshotter 2 1 1 may be able to 
avoid perfom.ingacopy-on-write by using the bitmap 209 and tables213mamamier 

described in fiirdier detml below. 
20 The snapshotter 21 1 also enables the file system 207 to read snapshot versions of 

files. To the file system 207 the snapshot virttial volume 217 appears to be another block 
device, which the file system 207 can mount. If a requested file that was in the original 
volume at the time of the snapshot has been logically changed or nonlogically moved 
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since the time of the snapshot, the snapshotter 211, consulting its tables 213, will redirect 
the read request to the appropriate location in the differential storage space 227 or in the 
original volume 221 where that snapshot version is stored. 

As mentioned above, a bitmap 209 is used by the snapshotter 2 1 1 to determine 
5 whether a particular block location must be protected by a copy-on-write operation. In 
the bitmap 209, a particular bit represents a particular block in the volume 22 1 . When the 
snapshot is captured,asubsetofthe blocks in the volume 221 will be logicallyoccupied, 

in the sense that they are at that moment being used to implement existing files. Another 
subset of blocks will constitute logically free space. In the initial configuration of the 
10 bitmap 209, all occupied-space blocks will have their corresponding bits set to "protect," 

... , . , - ..^,1 1- *u.;, uuc =Pt tn "ionore." because there is no reason to 

and all tree-space uiuwva wm •-' 

r jfo fnr a WorV that was logically insignificant at the time of the 

penuilll a \^up;r-wii-»TAxv*' 

snapshot. In the embodiment illustrated in the examples of FIGS. 1-3 above and in the 
examples discussed below, the "ignore" value is 1 and the "protect" value is 0. It should 
1 5 be noted that once a copy-on-write is perfom^ed for a particular block, it is no longer 
necessary for the snapshotter 21 1 to protect that block. 

Referring now to FIG. 6, the depicted example illustrates how the snapshotter 
handles the straightforward case of a logically significant request to write a block 
location, fa FIG. 6A, the snapshotter has intercepted a WRITE.BLOCK call 301 ftom 
20 the file system, which seeks to write data 303 at the block location here designated COS 
307. m bit 317 in the bitmap 319 corresponding to this block is 0, so the block 307 
must be protected with a copy-on-write operation 3 1 1 copying its data to differential 
storage space 313 located on a volume 315. FIG. 6B presents the view after the copy-on- 
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„i,ehas*npiaceandafterfte»ri«ofblookCM323hasbeenpermined,ogo 

f„r™d. The bit 343 correspondmg to this block 323 is set to 1 . siace no fiorther 
protectionofthesnapshotvetsionofthisblockwillbeneeded. The eopy-o«-write has 

been made at location iX)/ 341 in fte differential storage space 33 1 . A tabic data 
5 structure333,mappingblocks327to.hclocation329atwUchd.esnapshotversionsof 

ftose blocks are stored, record. d.efactd.atblockC03335has been copied to 

differential location DOl 337. 

Referring now to FIG. 7, Are depicted e«nple shows the simplest case involving 
alogically insignificant block move, such as that which might be requested by Ute file 
,0 sys.emduringtheexecudonofadidcdefagmentafionope,ationfollowing.hetimeof.he 

, ^.Mrritf. r>np.ration is avoided in such a 

snapshot. The example iiiusiraies uow a ^y.^'j •■■ 

.. . ,....^:-f..«^,t;nnrPoarHinfi the contents and location of the 
situation witnoui any luss ui iiuvi"."- o~ - - ^ 

snapshot versionofdre protected block. FIG.7Arepresents the situation after .he request 
isin,ere=p.edbutbetoreitisp«mittedtop»>ceed.Tl.esnapsho.tcrismadeawareofd>e 

,5 nonlogical nature of the requested operation by the file system's use of a BLOCK_COPY 
call 405, in accordance with die invention, instead of READ.BLOCK and 
WRTTE.BLOCK calls. Here d,e request involves the relocation of dre data in block C03 

407 to block COS 409 in die same volume 401 . In die bitinap 403, die bit 413 
correspondingto block CM 407 isCso some effortmustbemadetopreservediedatain 

20 tiusblock407asdies»ipshotversionofbl«kC«407. Tidbit 41 5 corresponding to 

te destination block 409 is set to 1, as might be expected if die requested move is a 
deftagmentation operation selectingacurren. free-space location in die volume 401 as 

die new location for die block data being moved. If die bitmap bits 413, 415 
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corresponding to blocks C03 407 and COS 409 in FIG. 4A had been other than 0 and 1, 
respectively, the snapshotter would have handled the BLOCK.COPY request 405 
differently. This will be explained below in the discussion of the flow diagram of FIG. 
13. 

5 As a consequence of the requested block move, a logically occupied block, which 

is one of the blocks that must be protected by the snapshotter, becomes free space, and a 
free-space block becomes occupied space. This change can be reflected in the bitmap 
simply by exchanging the bit values 41 1 in the two bits 413, 41 5 corresponding to the 
two blocks 407, 409 involved in the move. FIG. 7B depicts the situation after the block 

10 move has taken place. Block COS 425 now holds the data that was previously held in 
.... . u:+o A07 m the hitman 419 have been 

block LUJ 4/1, anu llie cuiicapuiiuins t.xw^ — 

, , r^ffh. cn.n.hnt version of block C03 435 to block COS 437 is 

SWltCneU. IIIC IClUVttHVHX vyx v**w .^**^x 

recorded in the table 429. The mapping here is a translation to another offset in the 
volume 417. If the snapshotter receives a request to read the snapshot version of block 
15 COS, it will look up COS 435 in the table 429 and find that the snapshot copy is currently 
located at COS 437. The read request will be directed to block COS 425. 

Referring now to FIG. 8, the depicted example proceeds from the state of FIG. 
7B. In FIG. 8A, two logically significant WRITE_BLOCK requests 551, 553 are 
received for the respective block locations COS 507 and COS 509, the same locations that 
20 were involved in the preceding logically insignificant move. The request 551 to write 
block COS 507 will be allowed by the snapshotter without fiirther action, since its 
corresponding bit 513 in the bitmap 503 is set to 1, indicating that it can be written freely. 
The bit 515 corresponding to block COS 509, however, is 0, so it must be protected with a 
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copy-on-write before it can be written. FIG. 5B illustrates the situation following the 
writes. Blocks COS 521 and COS 525 now hold the new data. The bitmap bit 523 
corresponding to block C03 521 remains 1, of course. The bit 527 corresponding to 
block COS 525 is set to 1 following the copy-on-write 543 depicted in FIG. 8A. The 
copy-on-write 543 copied the old value of COS 509, which is the snapshot version of 
current block C03 521, in location D02 547 in the differential storage space 549. In the 
storage/translation table 529, the mapping 537 for block C03 535 is updated accordingly, 
recording D02 541 as the current location of the snapshot block COS 539. 

Although the diagrams of FIGS. 6-10 show a single mapping table for illustrative 
simplicity, an additional reverse mapping table may be used. This reverse mapping table 

1 . ^ir*u^ ..r.^^ ^ofa ctniPtiirP thft direct-maDDine translation table, as 

may oe siuica as pan oiax^^^ . . - 

.u. fl.,.. A^.nrorr.^ nf v^GR 1 A. or. in the altematlve, it may be maintained as a 

111 lliv XXV/ ry v*Av*£,* ^ ^ 

separate data structure. A reverse mapping table entry provides, for fast lookup, the 
mapping from a first block in the original volume to a second block in the same volume, 
the second block signifying the location whose snapshot version the first block is holding. 
In the example of FIG. 8, the snapshotter looks up COS in the reverse mapping table, 
finding COS mapped to COS, the block location ofCOS's data at the time of the snapshot. 

While the case of FIGS. 7 and 8 is one in which there was ultimately no net 
benefit in the original avoidance of a copy-on-write, in general it is impossible to predict 
whether there will be a logically significant write to a block that has previously been the 
subject of a logically insignificant move. In the case of a block move pursuant to a 
defragmentation operation, it is particularly likely that the benefit of avoiding the copy- 
on-write will be preserved, since the defi^gmentation of an entire volume of blocks will 
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involve many moves, only a small nwnber of which can be expected » be *e subject of 

subsequent logical writes. 

Referring now to FIG. 9, the example depicted therein proceeds from the state of 
FIG. 7B and illustrates how the snapshotter handles the move of a previously-moved 
5 block. InFIG.9A,thesnapshotterinterceptsafilesystemBLOCK_COPY command 
605 foralogicallyinsignificant move 643from block COS 609 to blocka0607,in 

accordance with the invention. The bitmap bits 615,613 for these blocks are 0 and 1 
respectively,asinthe example ofFIG.7,andagainthebits615,613 will be exchanged 

645 in order to update the bitmap 603 to reflect the changed block configuration. The 
10 snapshotter looks up COS 637 in the reverse mapping table corresponding to the depicted 

• . ciz r.;««;fvino that block COS 609 is the 
table 629, finding tne reverse mapping lu ^^o—^ -o 

. .. « „ nf WnrV r.03 635. As shown in FIG. 9B, 

current location oi uic snapauui - 

representing the state after the datapreviouslystoredinblock COS 625 has beenmoved 

to CIO 653, the table 655 is updated so that C03 647 is mapped compositionally to CIO 
15 649ratherthantoC0S641. The bits 627, 651 corresponding to blocks COS 625 and C70 
653 respectively have been exchanged, with CI O's bit 65 1 now having the protect value 
0. 

Referring now to FIG. 10, the example of FIG. 9 is continued in FIG. lOA, with a 
file system attempt715tononlogicallymove the datainblockCi0709 to block COi 

20 705, using the BLOCK_COPY command 713 in accordance with the invention. The 
move destination 705 is also the snapshot-time location of data currently stored in CIO 
709. The bitmap bits 71 1, 707 corresponding to blocks CIO 709 and COS 705 are 0 and 1 
respectively,and the bitsareexchanged717,as seen inFiaiOB following the move, 
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where Clffs bit 743 is now 1 and C03's bit 741 is 0, as in the original bitmap 703. A 
lookup of C70 731 in the reverse mapping table corresponding to the depicted table 719 
reveals C/O 731 to be the current location of the snapshot version of block C03 111. The 
appropriate update to the table 745 is the entry 761, 755 mapping COS to COS, but this is 
a cycle that may simply be removed from the table. Thus, with respect to block COS 735, 
the snapshot-time status quo has been restored. 

The algorithms applied in the previous examples are presented in further detail in 
the flow diagrams of FIGS. 11-14. FIG. 11 represents a procedural overview of an 
embodiment of the invention. At step 800 the procedure is begun. In step 802 the 
snapshotter captures a snapshot of an original disk volume at a point in time, following 

^r.A *i ^^^;o+*>/i Kifmnn initially assienins 1 (the "ignore" 

wnicn, in step out, n ^/icai^o uiv u,ljuv/^***v - ^ ^ - 

i.„;.„ii„ fr.. w^rVc f,nH 0 ^the "orotcct" valuc) to logically occupied blocks. In 
step 806 the snapshotter assumes the role of monitoring file system requests to access 
blocks in the original volume, as well as the role of enabling the file system to read the 
snapshot virtual volume. The method relating to the snapshot of step 802 terminates in 
step 808. 

FIGS. 12-14expand upon the post-snapshot step 806 ofFIG. 11. These diagrams, 
like the flow diagrams of FIGS. 1-3, assume that the snapshotter maintains one 
translation table holding up to two mappings for each original volume block entry a. One 
mapping, denoted a b, signifies that block b currently stores the snapshot copy of a. A 
second mapping, denoted a -»> c, the reverse mapping referred to above, signifies that 
block a currently stores the snapshot copy of c. 

The flow diagram of FIG. 12 presents the steps associated with the interception of 



20 



a logically significant WRITE.BLOCK fton. the file system. Following the entry into 
teptocete(step 900), in step 902 the snapshotterdetectsaneffonbythe file systemto 

logically write block k in the original volume. In step 904, the snapshottei checks the 
vaheofthecortesponding bit inthe bitmap. Ifthisbitis 1, the file system write can 
5 proceed(s,ep914)a«<ithesnapshot.erexitstheprocednre(s.ep916). ItthebitisCthe 
block data mnst be protected. A copy-on-write operation copies the block to a 
differential storage location d (step 906). and the bit corresponding to the copied block is 
set to 1 (step 908), pemiitting subsequent accesses of the block to be ignored. 

In step 910 the snapshotter determines whether there is an entry *-»> j in the 
10 table, reverse-mapping* to some block; in the original volume. If so, block Hs the 

/-II -1-.- ^^r»t^cV»rvtfpr rf^mnves this reverse 
current location of the snapshot version oi mv^^j. ---f 

i: from the table (step 
mapping (step 918) and tne correspuiiumg uuw. ■ ■ 

920). It makes a new table entry; d , recording differential storage location d as the 
current location of the snapshot version of) (step 922). At step 914 the file system is 
15 permittedtowriteblock^,andthesnapshotterthenexits(step916). If, however, there 
was no reverse-mapping entry for Hn the table, the snapshotter makes an entry^-^^in 

the table (step 912). Block k can then be written by the file system (step 914), and the 

algorithm terminates (step 916). 

The flow diagram of FIG. 13 presents dte steps associated withthe intetception of 
20 a file system attempt to nonlogically move a block of data from one block location; to 
another block location k in the volume. The snapshotter enters the procedure (step 1000) 
andreceivesthemoverequesKst^ 1002). The bitmap bits for the source and destination 
blocks are examined respectively in steps 1004 and 1006. If the bit corresponding to 
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block; is 1 , or if .he bit co-responding to block Hs 0, *e sm.pshotter will tt«t the 
rCuestasaREAD.BLOCKonytobefolIowedbyaWRITE.BLOCKon^usingthe 

datastoredinyCstep 1007). To handle the WRITE.BLOCK on t. the snapshotter follows 

the procedure outlined in FIG. 12 (step 1009). 
5 If the bit corresponding to> is 0 and the bit corresponding to t is 1 , the 

optimization associated with the invention can be realized. The soapshotter detemnnes 
whether there isarever..mappingentryy->>>. in *e table rnappingyto some block, 

in the sM,evolume(stepl008). If so,;is currently storing the snapshot version otblock 
, Thedirect-mapptagtableentryi-^; is deleted (step 1010). and the corresponding 

,0 reverse-mappingtableen.ryy-»> -is delete (step 10.2). If, a«I* are not .be same 
block location, determined at step 1014, a direct-mapping entry / « .s added >„ u„ 
,able(step 1016), as is the corresponding reverse mapping »-»>. (step .u..,. 
two stepsareskippedif,andtaretesame.lnei*erca»,*e bits oorrespondingtoj 
andtareswapp«d(step 1024), Ute block move is allowed to proceed (step 1040),a„dthe 

1 5 procedure terminates (s.ep 1042), flte block move having been achieved without a copy- 
on-write operation. 

Finally, tite flow diagram of FIG. 14 presents flie steps taken by the snapshotter 
i„ enabling the file system to read tire virtud snapshot volume. The procedure begins a. 
step 1 100, and a. step 1 102 a file system request to read a particular block v in the 
20 snapshotvolumeisreceived. The snapshotter determines whete there is an entry 

V w in Ure .able (step 1 104). If such an en«y exists, i. signifies fta. fte snapsho. copy 
„fblockviss»reda.a.«.herlocationw,eid,eri„thesamevolumeorinthedifferenti^ 

storage space. TTre snapshotter directs the file system read to w (step U06), and the 
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procedure .ermta.es (s.ep 1 UO). If there is no entry for v in fte able, the snapshot copy 
of block V is the same as the current contents of block v in the original volume. The 
SMpshotterftereforedireCs the read tt,4eac-bl«kv(stepll08).and4epr„ced«e 

terminates (step 1110). 

5 The foregoing detailed description discloses a method for capturing and 

maintainingadifferential snapshot in whichlogicallysignificant^^^^ 
distinguished from logically insignificant moves of block data. Hie ability of the 
snapshotter to make this distinction is accomplished by an imiovation in the file system 
whereby a BLOCK_COPY command can be passed to drivers below the file system 

10 level, which also enables those drivers to take advantage of hardware acceleration of data 

ih<:tantial economies of 
block copies. With respect lo uic umciwiiuai ..x-^.... , - - - - 

. . .»o«,pWvf.H While, as those skilled in the art will 

processing xime aiiu siuiagw opu*,v 

readily recognize, the invention is susceptible to various modifications and alternative 
constructions, certain illustrative embodiments have been shown in the accompanying 
15 drawingsandhavebeendescribedaboveindetail. It should be understood, however, that 
tiiere is no intention to limit tiie invention to the specific forms disclosed. On the 
contrary, the intention is to cover all modifications, alternative constructions, and 
equivalents falling within the spirit and scope of the invention. 
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